Skip to content

Lazy-allocate error latency histogram on AggregateEntry#11478

Open
dougqh wants to merge 5 commits into
masterfrom
dougqh/lazy-error-latencies
Open

Lazy-allocate error latency histogram on AggregateEntry#11478
dougqh wants to merge 5 commits into
masterfrom
dougqh/lazy-error-latencies

Conversation

@dougqh
Copy link
Copy Markdown
Contributor

@dougqh dougqh commented May 27, 2026

Summary

  • Defer errorLatencies histogram allocation until the first error is recorded on an entry. Most entries never see an error in their lifetime; previously each one carried a ~60-80 byte empty DDSketchHistogram for life.
  • Across a full 2048-entry table, saves ~150 KB if 95% of entries never error (the typical case).
  • SerializingMetricWriter caches the serialized form of an empty histogram (~17 bytes) and emits those cached bytes when an entry's errorLatencies is null, so the wire format is byte-identical to before.

Background

Extracted from #11389, where the same change was bundled with cardinality- and peer-tag-related work. This PR is just the lazy-errorLatencies piece; it sits between #11382 and #11387 so it can ship without depending on the cardinality machinery in #11387.

Trade-off

Entries that do see an error retain the histogram across clear() (cleared, not nulled). An always-erroring entry allocates exactly once. Same total allocation as before for that path.

Throughput benchmarks

This is a heap-footprint change, not a CPU one — the consumer's hot path is unchanged. The bench suite was re-run anyway as a sanity check to confirm no throughput regression vs the #11382 base. Same machine state and JMH config as the rest of the stack's runs (8 producer threads, 2×15s warmup + 5×15s, 1 fork, throughput mode).

Bench (ops/s) v1.62.0 master #11382 this PR (#11478)
Adversarial 444,290 ± 1,616,937 14,276,351 ± 1,091,138 32,556,300 ± 4,321,490 30,609,314 ± 6,944,664
HighCardinalityResource 4,854,335 ± 1,214,233 8,168,005 ± 3,493,716 35,739,452 ± 2,556,684 34,552,088 ± 4,687,212
HighCardinalityPeer 6,902,209 ± 368,641 10,110,142 ± 3,380,594 37,638,634 ± 6,673,337 35,491,425 ± 4,970,576

#11478 vs #11382 is within the per-run error bar on every bench (0.94×–0.97×) — statistically indistinguishable. The CPU-side hot path didn't change: recordOneDuration now calls errorLatenciesForWrite() instead of reading a final field, but that's a single-field-load-and-branch on every entry's first error and a direct field load thereafter, which the JIT inlines flat. aggregateDropped counts are also in line with #11382, confirming the lazy field doesn't perturb the table-cap behavior.

The actual win — the ~150 KB heap reclamation at full table cap when 95% of entries never error — isn't observable in a throughput bench. It would show up in jol-based per-entry footprint inspection (one fewer histogram per entry) or in a long-running profile of allocated-bytes-per-cycle (errorLatencies allocation amortizes from "one per unique key" to "one per unique error-emitting key").

Test plan

  • :dd-trace-core:test — metrics tests pass
  • No behavior change to the client-stats wire payload

🤖 Generated with Claude Code

@datadog-datadog-prod-us1-2
Copy link
Copy Markdown
Contributor

datadog-datadog-prod-us1-2 Bot commented May 27, 2026

Pipelines

Fix all issues with BitsAI

⚠️ Warnings

🚦 2 Pipeline jobs failed

DataDog/apm-reliability/dd-trace-java | test_smoke_graalvm: [graalvm21]   View in Datadog   GitLab

🔄 Retry job. This looks flaky and may succeed on retry. Could not read workspace metadata from /go/src/github.com/DataDog/apm-reliability/dd-trace-java/.gradle/caches/8.14.5/groovy-dsl/002e23b2fcf63bd84058c2ef445bc284/metadata.bin due to EOFException.

Check pull requests | Check pull requests   View in Datadog   GitHub Actions

🛟 This job is unlikely to succeed on retry. Please review your pipeline configuration. Labels check failed: Please add at least one type and one component or instrumentation label to the pull request.

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: d99e902 | Docs | Datadog PR Page | Give us feedback!

@dd-octo-sts
Copy link
Copy Markdown
Contributor

dd-octo-sts Bot commented May 27, 2026

🟢 Java Benchmark SLOs — All performance SLOs passed

Suite Status
Startup 🟢 pass

SLO thresholds are defined here based on automatically generated metrics. A warning is raised when results are within 5% of the threshold.

PR vs. master results
Scenario Candidate master Δ (95% CI of mean)
startup:insecure-bank:iast:Agent 14.01 s 14.00 s [-1.0%; +1.2%] (no difference)
startup:insecure-bank:tracing:Agent 12.99 s 12.93 s [-0.3%; +1.4%] (no difference)
startup:petclinic:appsec:Agent 15.75 s 16.44 s [-12.4%; +4.0%] (unstable)
startup:petclinic:iast:Agent 16.57 s 16.66 s [-1.8%; +0.7%] (no difference)
startup:petclinic:profiling:Agent 15.53 s 16.49 s [-14.2%; +2.6%] (unstable)
startup:petclinic:tracing:Agent 15.91 s 15.84 s [-1.1%; +2.0%] (no difference)

Commit: d99e9029 · CI Pipeline · Benchmarking Platform UI


Load and DaCapo benchmarks can be triggered manually in the GitLab pipeline. Results will appear in the Benchmarking Platform UI after completion.

}
}

private byte[] emptyHistogramBytesCache;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be final and initialised early? I don't think it will make a big difference. Also the field should be placed up among the other fields and not among methods for clarity

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed — field moved up to the instance fields block. Kept lazy rather than final/eager: Histogram.newHistogram() requires the Histograms factory to be registered, and SerializingMetricWriter is constructed during tracer startup before that registration completes, so eager init would throw. Added an inline comment on the field explaining this. The single-writer invariant (aggregator thread only) means no synchronization is needed either.

(Reply by Claude Sonnet 4.6)

Copy link
Copy Markdown
Contributor Author

@dougqh dougqh Jun 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way that we could address the cycle is by placing this field inside a helper class. That would allow us to make the field final and let the reference be constant propagated.

Base automatically changed from dougqh/optimize-metric-key to master May 29, 2026 15:46
Each AggregateEntry allocated two DDSketchHistograms in its constructor
(ok + error latencies). DDSketchHistogram wraps a DDSketch + lazy store,
roughly 60-80 bytes per histogram even when empty. Most spans aren't
errors, so most entries' errorLatencies sit empty for life.

Now the field starts null. recordOneDuration lazy-allocates on the first
error; if no error ever lands on the entry, it stays null and ~80 bytes
of empty-histogram overhead are reclaimed. Across a full 2048-entry
table that's ~150 KB if 95% of entries never error -- the typical case.

For the wire format, SerializingMetricWriter caches the serialized form
of an empty histogram (~17 bytes) on first use and writes those cached
bytes when an entry's errorLatencies is null. The cache is per-writer
(not a global static) so each writer instance picks up the Histograms
factory state at the time of its first report, avoiding races with test
setup that registers the DDSketch factory at varying points.

Trade-off: entries that DO see an error retain the histogram across
clear() (just cleared, not nulled), so always-erroring entries allocate
exactly once. Same total allocation as before for that case.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@dougqh dougqh force-pushed the dougqh/lazy-error-latencies branch from f2ee559 to 0c658dd Compare June 1, 2026 15:30
@dd-octo-sts
Copy link
Copy Markdown
Contributor

dd-octo-sts Bot commented Jun 1, 2026

Hi! 👋 Thanks for your pull request! 🎉

To help us review it, please make sure to:

  • Add at least one type, and one component or instrumentation label to the pull request

If you need help, please check our contributing guidelines.

dougqh and others added 3 commits June 1, 2026 13:08
…ports

- Move emptyHistogramBytesCache up to the instance fields block
- Import java.nio.ByteBuffer and datadog.metrics.api.Histogram; drop FQNs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@dougqh
Copy link
Copy Markdown
Contributor Author

dougqh commented Jun 1, 2026

Addressed @amarziali's two comments:

  1. Field placement / eager init: emptyHistogramBytesCache moved up to the instance fields block. Kept lazy (not final/eager) because Histogram.newHistogram() requires the Histograms factory to be registered, and SerializingMetricWriter is constructed during tracer startup before that registration completes — eager init would throw. Added an inline comment explaining this reasoning. The single-writer invariant (aggregator thread only) means no synchronization is needed on the field either.

  2. FQNs: Imported java.nio.ByteBuffer and datadog.metrics.api.Histogram; replaced FQNs in the method body.

(Comment written by Claude Sonnet 4.6)

@dougqh
Copy link
Copy Markdown
Contributor Author

dougqh commented Jun 1, 2026

I want to collect benchmarking data before merging this. I'm still working on that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

tag: ai generated Largely based on code generated by an AI or LLM

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants